A novel string distance metric for ranking Persian respelling suggestions

نویسندگان

  • Omid Kashefi
  • Mohsen Sharifi
  • Behrooz Minaie
چکیده

Spelling errors in digital documents are often caused by operational and cognitive mistakes, or by the lack of full knowledge about the language of the written documents. Computer-assisted solutions can help to detect and suggest replacements. In this paper, we present a new string distance metric for the Persian language to rank respelling suggestions of a misspelled Persian word by considering the effects of keyboard layout on typographical spelling errors as well as the homomorphic and homophonic aspects of words for orthographical misspellings. We also consider the misspellings caused by disregarded diacritics. Since the proposed string distance metric is custom-designed for the Persian language, we present the spelling aspects of the Persian language such as homomorphs, homophones, and diacritics. We then present our statistical analysis of a set of large Persian corpora to identify the causes and the types of Persian spelling errors. We show that the proposed string distance metric has a higher mean average precision and a higher mean reciprocal rank in ranking respelling candidates of Persian misspellings in comparison with other metrics such as the Hamming, Levenshtein,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel three-stage distance-based consensus ranking method

In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights ob...

متن کامل

A Study on Exponential Fuzzy Numbers Using alpha-Cuts

In this study a new approach to rank exponential fuzzy numbers using  -cuts is established. The metric distance of the interval numbers is extended to exponential fuzzy numbers. By using the ranking of exponential fuzzy numbers and using  -cuts the critical path of a project network is solved and illustrated by numerical examples. Keywords: Exponential Fuzzy Numbers,  -cuts, Metric Dista...

متن کامل

A Novel String Distance Function based on Most Frequent K Characters

This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length are fast to calculate but doesn’t represent the string correctly. On the other hand the methods like keeping the histogram over all characters in the string ...

متن کامل

A Novel String-to-String Distance Measure With Applications to Machine Translation Evaluation

We introduce a string-to-string distance measure which extends the edit distance by block transpositions as constant cost edit operation. An algorithm for the calculation of this distance measure in polynomial time is presented. We then demonstrate how this distance measure can be used as an evaluation criterion in machine translation. The correlation between this evaluation criterion and human...

متن کامل

A novel ranking method for intuitionistic fuzzy set based on information fusion and application to threat assessment

A novel ranking method based on multi-time information fusion is proposed for intuitionistic fuzzy sets (IFSs) and applied to the threat assessment problem, a multi-attribute decision making (MADM) one. This method integrates a designed intuitionistic fuzzy entropy (IFE), the closeness degree of technique for order preference by similarity to ideal solution (TOPSIS), the decision maker¡¯s (DM¡¯...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2013